Skip to content

Conversation

@sunjiweiswift
Copy link

@sunjiweiswift sunjiweiswift commented Sep 10, 2025

image

Missing features

  1. GQA optimization, allowing multiple Qs and a single K to be computed in the same GEMM
    2. Slide window -- done

@sunjiweiswift
Copy link
Author

@rolandschulz @tdeng5 @jiyang1011 please review

@tdeng5 tdeng5 self-requested a review September 10, 2025 09:23
@sunjiweiswift sunjiweiswift force-pushed the flash_chunk_prefill branch 8 times, most recently from 0505aed to 8c5d3ce Compare September 17, 2025 02:48
@sunjiweiswift
Copy link
Author

@rolandschulz @tdeng5 @jiyang1011 please review

@sunjiweiswift
Copy link
Author

@rolandschulz pls review

Valentine233 and others added 9 commits September 30, 2025 23:40
This change imports `SYCLCompat` to cutlass-sycl repo as `compat`.
Previous dependencies on `syclcompat` are changed to `compat`.
This PR also fix some failures of `SYCLCompat` in oneapi 2025.2.

---------

Co-authored-by: Roland Schulz <[email protected]>
@sunjiweiswift
Copy link
Author

@Antonyvance pls review again~

@Antonyvance
Copy link

@sunjiweiswift I believe this need to be reimplemented based on this PR 547. Would you be able to adopt?

@sunjiweiswift
Copy link
Author

@sunjiweiswift I believe this need to be reimplemented based on this PR 547. Would you be able to adopt?

Can I merge it first? The sglang-xpu already uses this kernel. However, thirdparty is currently my forked repo, so I can't use the public repo. The new API will be available after the 547 merge. I will adapt and modify it in the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redesign required Implementation require a redesign

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants